1,489 research outputs found

    Food-chain competition influences gene's size

    Full text link
    We have analysed an effect of the Bak-Sneppen predator-prey food-chain self-organization on nucleotide content of evolving species. In our model, genomes of the species under consideration have been represented by their nucleotide genomic fraction and we have applied two-parameter Kimura model of substitutions to include the changes of the fraction in time. The initial nucleotide fraction and substitution rates were decided with the help of random number generator. Deviation of the genomic nucleotide fraction from its equilibrium value was playing the role of the fitness parameter, BB, in Bak-Sneppen model. Our finding is, that the higher is the value of the threshold fitness, during the evolution course, the more frequent are large fluctuations in number of species with strongly differentiated nucleotide content; and it is more often the case that the oldest species, which survive the food-chain competition, might have specific nucleotide fraction making possible generating long genesComment: 11 pages including 7 figure

    Different evolutionary patterns between young duplicate genes in the human genome

    Get PDF
    BACKGROUND: Following gene duplication, two duplicate genes may experience relaxed functional constraints or acquire different mutations, and may also diverge in function. Whether the two copies will evolve in different patterns remains unclear, however, because previous studies have reached conflicting conclusions. In order to resolve this issue, by providing a general picture, we studied 250 independent pairs of young duplicate genes from the whole human genome. RESULTS: We showed that nearly 60% of the young duplicate gene pairs have evolved at the amino-acid level at significantly different rates from each other. More than 25% of these gene pairs also showed significantly different ratios of nonsynonymous to synonymous rates (K(a)/K(s )ratios). Moreover, duplicate pairs with different rates of amino-acid substitution also tend to differ in the K(a)/K(s )ratio, with the fast-evolving copy tending to have a slightly higher K(s )than the slow-evolving one. Lastly, a substantial portion of fast-evolving copies have accumulated amino-acid substitutions evenly across the protein sequences, whereas most of the slow-evolving copies exhibit uneven substitution patterns. CONCLUSIONS: Our results suggest that duplicate genes tend to evolve in different patterns following the duplication event. One copy evolves faster than the other and accumulates amino-acid substitutions evenly across the sequence, whereas the other copy evolves more slowly and accumulates amino-acid substitutions unevenly across the sequence. Such different evolutionary patterns may be largely due to different functional constraints on the two copies

    Shape restricted regression with random Bernstein polynomials

    Full text link
    Shape restricted regressions, including isotonic regression and concave regression as special cases, are studied using priors on Bernstein polynomials and Markov chain Monte Carlo methods. These priors have large supports, select only smooth functions, can easily incorporate geometric information into the prior, and can be generated without computational difficulty. Algorithms generating priors and posteriors are proposed, and simulation studies are conducted to illustrate the performance of this approach. Comparisons with the density-regression method of Dette et al. (2006) are included.Comment: Published at http://dx.doi.org/10.1214/074921707000000157 in the IMS Lecture Notes Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    Overlapping genes in the human and mouse genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Increasing evidence suggests that overlapping genes are much more common in eukaryotic genomes than previously thought. In this study we identified and characterized the overlapping genes in a set of 13,484 pairs of human-mouse orthologous genes.</p> <p>Results</p> <p>About 10% of the genes under study are overlapping genes, the majority of which are different-strand overlaps. The majority of the same-strand overlaps are embedded forms, whereas most different-strand overlaps are not embedded and in the convergent transcription orientation. Most of the same-strand overlapping gene pairs show at least a tenfold difference in length, much larger than the length difference between non-overlapping neighboring gene pairs. The length difference between the two different-strand overlapping genes is less dramatic. Over 27% of the different-strand-overlap relationships are shared between human and mouse, compared to only ~8% conservation for same-strand-overlap relationships. More than 96% of the same-strand and different-strand overlaps that are not shared between human and mouse have both genes located on the same chromosomes in the species that does not show the overlap. We examined the causes of transition between the overlapping and non-overlapping states in the two species and found that 3' UTR change plays an important role in the transition.</p> <p>Conclusion</p> <p>Our study contributes to the understanding of the evolutionary transition between overlapping genes and non-overlapping genes and demonstrates the high rates of evolutionary changes in the un-translated regions.</p

    Multidimensional scaling for large genomic data sets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Multi-dimensional scaling (MDS) is aimed to represent high dimensional data in a low dimensional space with preservation of the similarities between data points. This reduction in dimensionality is crucial for analyzing and revealing the genuine structure hidden in the data. For noisy data, dimension reduction can effectively reduce the effect of noise on the embedded structure. For large data set, dimension reduction can effectively reduce information retrieval complexity. Thus, MDS techniques are used in many applications of data mining and gene network research. However, although there have been a number of studies that applied MDS techniques to genomics research, the number of analyzed data points was restricted by the high computational complexity of MDS. In general, a non-metric MDS method is faster than a metric MDS, but it does not preserve the true relationships. The computational complexity of most metric MDS methods is over <it>O(N</it><sup>2</sup><it>)</it>, so that it is difficult to process a data set of a large number of genes <it>N</it>, such as in the case of whole genome microarray data.</p> <p>Results</p> <p>We developed a new rapid metric MDS method with a low computational complexity, making metric MDS applicable for large data sets. Computer simulation showed that the new method of split-and-combine MDS (SC-MDS) is fast, accurate and efficient. Our empirical studies using microarray data on the yeast cell cycle showed that the performance of K-means in the reduced dimensional space is similar to or slightly better than that of K-means in the original space, but about three times faster to obtain the clustering results. Our clustering results using SC-MDS are more stable than those in the original space. Hence, the proposed SC-MDS is useful for analyzing whole genome data.</p> <p>Conclusion</p> <p>Our new method reduces the computational complexity from <it>O</it>(<it>N</it><sup>3</sup>) to <it>O</it>(<it>N</it>) when the dimension of the feature space is far less than the number of genes <it>N</it>, and it successfully reconstructs the low dimensional representation as does the classical MDS. Its performance depends on the grouping method and the minimal number of the intersection points between groups. Feasible methods for grouping methods are suggested; each group must contain both neighboring and far apart data points. Our method can represent high dimensional large data set in a low dimensional space not only efficiently but also effectively.</p

    Computational reconstruction of transcriptional regulatory modules of the yeast cell cycle

    Get PDF
    BACKGROUND: A transcriptional regulatory module (TRM) is a set of genes that is regulated by a common set of transcription factors (TFs). By organizing the genome into TRMs, a living cell can coordinate the activities of many genes and carry out complex functions. Therefore, identifying TRMs is helpful for understanding gene regulation. RESULTS: Integrating gene expression and ChIP-chip data, we develop a method, called MOdule Finding Algorithm (MOFA), for reconstructing TRMs of the yeast cell cycle. MOFA identified 87 TRMs, which together contain 336 distinct genes regulated by 40 TFs. Using various kinds of data, we validated the biological relevance of the identified TRMs. Our analysis shows that different combinations of a fairly small number of TFs are responsible for regulating a large number of genes involved in different cell cycle phases and that there may exist crosstalk between the cell cycle and other cellular processes. MOFA is capable of finding many novel TF-target gene relationships and can determine whether a TF is an activator or/and a repressor. Finally, MOFA refines some clusters proposed by previous studies and provides a better understanding of how the complex expression program of the cell cycle is regulated. CONCLUSION: MOFA was developed to reconstruct TRMs of the yeast cell cycle. Many of these TRMs are in agreement with previous studies. Further, MOFA inferred many interesting modules and novel TF combinations. We believe that computational analysis of multiple types of data will be a powerful approach to studying complex biological systems when more and more genomic resources such as genome-wide protein activity data and protein-protein interaction data become available

    CpG island density and its correlations with genomic features in mammalian genomes

    Get PDF
    A systematic analysis of CpG islands in ten mammalian genomes suggests that an increase in chromosome number elevates GC content and prevents loss of CpG islands

    Identifying regulatory targets of cell cycle transcription factors using gene expression and ChIP-chip data

    Get PDF
    Abstract Background ChIP-chip data, which indicate binding of transcription factors (TFs) to DNA regions in vivo, are widely used to reconstruct transcriptional regulatory networks. However, the binding of a TF to a gene does not necessarily imply regulation. Thus, it is important to develop methods to identify regulatory targets of TFs from ChIP-chip data. Results We developed a method, called Temporal Relationship Identification Algorithm (TRIA), which uses gene expression data to identify a TF's regulatory targets among its binding targets inferred from ChIP-chip data. We applied TRIA to yeast cell cycle microarray data and identified many plausible regulatory targets of cell cycle TFs. We validated our predictions by checking the enrichments for functional annotation and known cell cycle genes. Moreover, we showed that TRIA performs better than two published methods (MA-Network and MFA). It is known that co-regulated genes may not be co-expressed. TRIA has the ability to identify subsets of highly co-expressed genes among the regulatory targets of a TF. Different functional roles are found for different subsets, indicating the diverse functions a TF could have. Finally, for a control, we showed that TRIA also performs well for cell-cycle irrelevant TFs. Conclusion Finding the regulatory targets of TFs is important for understanding how cells change their transcription program to adapt to environmental stimuli. Our algorithm TRIA is helpful for achieving this purpose.</p
    corecore